2 research outputs found
A neural analysis-synthesis approach to learning procedural audio models
The effective sound design of environmental sounds is crucial to demonstrating an immersive experience. Classical Procedural Audio (PA) models have been developed to give the sound designer a fast way to synthesize a specific class of environmental sounds in a physically accurate and computationally efficient manner. These models are controllable due to the choice of parameters from analyzing a class of sound. However, the resulting synthesis lacks the fidelity for the preferred immersive experience; thus, the sound designer would rather search through an extensive database for real recordings of a target sound class. This thesis proposes the Procedural audio Variational autoEncoder (ProVE), a general framework for developing a high-fidelity PA model through data-driven neural audio synthesis methods to address the lack of realism in classical PA models. The two-step procedure of training ProVE models is explained through examples of sound classes of footstep sounds and the sound of pouring water.
Furthermore, the thesis demonstrates a web application where users can generate footstep sounds by defining control variables for a pretrained ProVE model to show its capacity for interactive use in sound design workflows. The increase in fidelity from ProVE models is explored through objective evaluations of audio and subjective evaluations against classical PA methods. These results show that these learned neural PA models are feasible for sound design projects. The thesis concludes with a discussion of applications and future research directions
HOSC: A Periodic Activation Function for Preserving Sharp Features in Implicit Neural Representations
Recently proposed methods for implicitly representing signals such as images,
scenes, or geometries using coordinate-based neural network architectures often
do not leverage the choice of activation functions, or do so only to a limited
extent. In this paper, we introduce the Hyperbolic Oscillation function (HOSC),
a novel activation function with a controllable sharpness parameter. Unlike any
previous activations, HOSC has been specifically designed to better capture
sudden changes in the input signal, and hence sharp or acute features of the
underlying data, as well as smooth low-frequency transitions. Due to its
simplicity and modularity, HOSC offers a plug-and-play functionality that can
be easily incorporated into any existing method employing a neural network as a
way of implicitly representing a signal. We benchmark HOSC against other
popular activations in an array of general tasks, empirically showing an
improvement in the quality of obtained representations, provide the
mathematical motivation behind the efficacy of HOSC, and discuss its
limitations.Comment: 12 pages, 7 figure